The n th-Order Bias Optimality for Multichain Markov Decision Processes

نویسندگان

  • Xi-Ren Cao
  • Junyu Zhang
چکیده

The paper proposes a new approach to the theory of Markov decision processes (MDPs) with average performance criteria and finite state and action spaces. Using the average performance and bias difference formulas derived in this paper, we develop an optimization theory for average performance (or gain) optimality, bias optimality, and all the high-order bias optimality, in a unified way. The approach is simple, direct, natural and intuitive; it does not depend on Laurent series expansion and discounted MDPs. We also propose one-phase policy iteration algorithms for bias and high-order bias optimal policies, which are more efficient than the two-phase algorithms in the literature. Furthermore, we derive the high-order bias optimality equations. This research is a part of our effort in developing sensitivity-based learning and optimization theory. The new insights provided by this approach may lead to some new research directions such as on-line learning, performance derivative based optimization, and potential or high-order potential aggregations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous-time Markov decision processes with nth-bias optimality criteria

In this paper, we study the nth-bias optimality problem for finite continuous-time Markov decision processes (MDPs) with a multichain structure. We first provide nth-bias difference formulas for two policies and present some interesting characterizations of an nth-bias optimal policy by using these difference formulas. Then, we prove the existence of an nth-bias optimal policy by using nth-bias...

متن کامل

Bias Optimality for Multichain Markov Decision Processes

In recent research we find that the policy iteration algorithm for Markov decision processes (MDPs) is a natural consequence of the performance difference formula that compares the difference of the performance of two different policies. In this paper, we extend this idea to the bias-optimal policy of MDPs. We first derive a formula that compares the biases of any two policies which have the sa...

متن کامل

Sensitive Discount Optimality via Nested Linear Programs for Ergodic Markov Decision Processes

In this paper we discuss the sensitive discount opti-mality for Markov decision processes. The n-discount optimality is a reened selective criterion, that is a generalization of the average optimality and the bias optimality. Our approach is based on the system of nested linear programs. In the last section we provide an algorithm for the computation of the Blackwell optimal policy. The n-disco...

متن کامل

A Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Processes y

Since the long-run average reward optimality criterion is underselective, a decisionmaker often uses bias to distinguish between multiple average optimal policies. We study bias optimality in unichain, nite state and action space Markov Decision Processes. A probabilistic approach is used to give intuition as to why a bias-based decision-maker prefers a particular policy over another. Using rel...

متن کامل

A probabilistic analysis of bias optimality in unichain Markov decision processes

This paper focuses on bias optimality in unichain, nite state and action space Markov Decision Processes. Using relative value functions, we present new methods for evaluating optimal bias. This leads to a probabilistic analysis which transforms the original reward problem into a minimum average cost problem. The result is an explanation of how and why bias implicitly discounts future rewards.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Automat. Contr.

دوره 53  شماره 

صفحات  -

تاریخ انتشار 2008